This section will demonstrate how to use treeio to parse tree with associated data to a single object in R.
This lesson assumes a basic familiarity with R and data frames.
This lesson does not cover methods and software for generating phylogenetic trees, nor does it it cover interpreting phylogenies. Here’s a quick primer on how to read a phylogeny that you should definitely review prior to this lesson, but it is by no means extensive. Genome-wide sequencing allows for examination of the entire genome, and from this, many methods and software tools exist for comparative genomics using SNP- and gene-based phylogenetic analysis, either from unassembled sequencing reads, draft assemblies/contigs, or complete genome sequences. These methods are beyond the scope of this lesson.
treeio Packagetreeio is an R package that designed for phylogenetic tree data input and output. It is released as part of Bioconductor and ROpenSci projects.
Just like R packages from CRAN, you only need to install Bioconductor packages once (instructions here), then load them every time you start a new R session.
library(treeio)
Most tree viewer software (including R packages) focus on Newick and NEXUS file formats, and other evolutionary analysis software might also contains supporting evidence and/or analysis findings within the file that can be further analyzed in R or interpreted in phylogenetic context to help identifying evolutionary patterns.
treeio supports several file formats, including:
and software output from:
The treeio package implement several parser functions.
| Parser function | Description |
|---|---|
| read.beast | parsing output of BEAST |
| read.codeml | parsing output of CodeML (rst and mlc files) |
| read.codeml_mlc | parsing mlc file (output of CodeML) |
| read.hyphy | parsing output of HYPHY |
| read.jplace | parsing jplace file including output of EPA and pplacer |
| read.mrbayes | parsing output of MrBayes |
| read.newick | parsing newick string, with ability to parse node label as support values |
| read.nhx | parsing NHX file including output of PHYLDOG and RevBayes |
| read.paml_rst | parsing rst file (output of BaseML or CodeML) |
| read.phylip | parsing phylip file (phylip alignment + newick string) |
| read.r8s | parsing output of r8s |
| read.raxml | parsing output of RAxML |
After parsing, storage of the tree structure with associated data is made through a S4 class, treedata, defined in the treeio package. These parsed data are mapped to the tree branches and nodes inside treedata object, so that they can be efficiently used to visually annotate the tree using ggtree package.